Level Up Your Docs

Jacki Buros

Generable Inc

Our goals

  1. Reliable, readable code
  2. Nice UI (clean)
  3. Clear messaging
  4. Wow factor
  5. Easy to update with new data

However

Things don’t always go as planned …

Poll

How many of you routinely:

  1. Use git or another version control system?
  2. Use quarto or Rmarkdown?
  3. Use testthat or another testing framework?

Plan for today

  1. Table stakes
  2. Quick wins
  3. Eye candy
  4. Other

Table stakes

Does the doc render?

Will it render again?

Try it

  1. Create .github/workflows directory
mkdir -p .github/workflows
  1. Copy this yaml file into that directory
jobs:
  build-deploy:
    runs-on: ubuntu-latest
    
    steps:
      - name: Check out repository
        uses: actions/checkout@v4
      - name: Set up Quarto
        uses: quarto-dev/quarto-actions/setup@v2
      - name: Setup R
        uses: r-lib/actions/setup-r@v2
      - name: Render Quarto Project
        uses: quarto-dev/quarto-actions/render@v2

stopifnot

library(tidyverse)
library(gt)

# load data
data(sp500)

# do some work
us_spending <- read_xls(here::here('data', 'hist01z1_fy2024.xlsx'))
d <- sp500 |>
  mutate(year = year(date)) |>
  left_join(us_spending, by = 'year')

# check!
1stopifnot(nrow(sp500) == nrow(d))
1
Check if your number of records has changed!

Other considerations

  • Use here (or Rprojroot)
  • Use renv (or groundhog)
  • or pak

Quick wins

Summarize data

  • gtsummary
  • another option: table1

Invest time improving tasks you do often!

gtsummary

Code
library(tidyverse)
library(gtsummary)
data("rx_adsl", package = 'gt')
set_gtsummary_theme(theme_gtsummary_compact())
rx_adsl |> 
  select(-starts_with('STUDY'), 
         -USUBJID, -TRTAN, 
         -SCRFREAS) |>
  gtsummary::tbl_summary(by = TRTA)
Characteristic Placebo, N = 901 Drug 1, N = 901
ITT Population Flag

    Y 90 (100%) 90 (100%)
Randomization Flag

    Y 90 (100%) 90 (100%)
Age 41 (36, 46) 39 (35, 42)
Age Group

    <40 36 (40%) 49 (54%)
    >=40 54 (60%) 41 (46%)
Sex

    Male 59 (66%) 57 (63%)
    Female 25 (28%) 32 (36%)
    Undifferentiated 6 (6.7%) 1 (1.1%)
Ethnicity

    Hispanic or Latino 38 (42%) 29 (32%)
    Not Hispanic or Latino 46 (51%) 50 (56%)
    Missing 6 (6.7%) 11 (12%)
Body Mass Index 26.0 (21.3, 30.3) 26.9 (22.9, 31.8)
Event Flag

    N 50 (56%) 28 (31%)
    Y 40 (44%) 62 (69%)
1 n (%); Median (IQR)

describe

Code
data("sp500", package = 'gt')
library(Hmisc)
library(sparkline)
des <- describe(sp500)
print(des, 'continuous')
sp500 Descriptives
7 Continous Variables of 7 Variables, 16607 Observations
Variable Label n Missing Distinct Info Mean Gini |Δ| Quantiles
.05 .10 .25 .50 .75 .90 .95
date 16607 0 16607 1 1983-01-14 8031 1953-05-01 1956-08-15 1966-07-09 1983-01-27 1999-07-01 2009-05-28 2012-09-11
open 16607 0 13304 1 484.2 569.1 24.84 43.90 83.88 145.47 952.97 1348.37 1505.60
high 16607 0 13279 1 487.2 572.6 24.84 43.90 84.63 146.62 960.96 1357.25 1513.38
low 16607 0 13274 1 481 565.4 24.84 43.90 83.21 144.58 944.99 1336.68 1495.67
close 16607 0 13271 1 484.3 569.2 24.84 43.90 83.90 145.70 953.35 1348.70 1505.81
volume 16607 0 10855 1 7.97e+08 1.239e+09 1.740e+06 2.390e+06 7.650e+06 7.220e+07 7.952e+08 3.311e+09 4.144e+09
adj_close 16607 0 13271 1 484.3 569.2 24.84 43.90 83.90 145.70 953.35 1348.70 1505.81

describe

rx_adsl Descriptives
2 Continous Variables of 14 Variables, 182 Observations
Variable Label Units n Missing Distinct Info Mean Gini |Δ| Quantiles
.05 .10 .25 .50 .75 .90 .95
AGE Age Years 182 0 26 0.996 40.15 6.738 31.05 33.00 35.25 40.00 45.00 48.00 50.95
BLBMI Body Mass Index kg/m2 182 0 182 1.000 26.7 5.718 19.19 19.92 22.64 26.50 30.57 33.47 34.19
rx_adsl Descriptives
7 Categorical Variables of 9 Variables, 182 Observations
Variable Label n Missing Distinct
TRTA Actual Treatment 180 2 2
ITTFL ITT Population Flag 182 0 2
RANDFL Randomization Flag 182 0 2
AAGEGR1 Age Group 182 0 2
SEX Sex 182 0 3
ETHNIC Ethnicity 182 0 3
EVNTFL Event Flag 180 2 2

Eye Candy

ggplotly

ggplotly

library(palmerpenguins)
library(plotly)
data(penguins, package = 'palmerpenguins')
p <- ggplot(penguins,
            aes(x = bill_length_mm, y = body_mass_g, 
                colour = species)) +
  geom_point() +
  theme_minimal() +
  scale_x_continuous('Bill Length (mm)') +
  scale_y_continuous('Body Mass (g)')
ggplotly(p)

gt & gtExtras

Use: gtExtras::gt_plt_dist to add a density plot to a table

Code
library(gt)
library(gtExtras)
library(svglite)
car_summary <- mtcars %>%
  dplyr::group_by(cyl) %>%
  dplyr::summarize(
    mean = mean(mpg),
    sd = sd(mpg),
    # must end up with list of data for each row in the input dataframe
    mpg_data = list(mpg),
    .groups = "drop"
  )
car_summary %>%
  arrange(desc(cyl)) %>% 
  gt() %>%
  gtExtras::gt_plt_dist(
    mpg_data, 
    type = "density", 
    line_color = "blue", 
    fill_color = "red") %>%
  fmt_number(columns = mean:sd, decimals = 1)
cyl mean sd mpg_data
8 15.1 2.6
6 19.7 1.5
4 26.7 4.5

gt & gtExtras

Or, using a percent bar: gt_plt_bar_pct

Code
mtcars %>%
   head() %>%
   dplyr::select(cyl, mpg) %>%
   dplyr::mutate(mpg_pct_max = round(mpg/max(mpg) * 100, digits = 2),
                 mpg_scaled = mpg/max(mpg) * 100) %>%
   gt() %>%
   gt_plt_bar_pct(column = mpg_scaled, scaled = TRUE)
cyl mpg mpg_pct_max mpg_scaled
6 21.0 92.11
6 21.0 92.11
4 22.8 100.00
6 21.4 93.86
8 18.7 82.02
6 18.1 79.39

example

This is an example table where we show the %efficacy for a treatment both numerically & graphically

ggridges

Code
library(tidyverse)
library(ggridges)
mtcars |>
  mutate(cylinders = factor(cyl),
         cylinders = fct_reorder(cylinders, cyl)) |>
  ggplot(aes(x = mpg, y = cylinders, group = cylinders,
             colour = cylinders, fill = cylinders)) +
    ggridges::geom_density_ridges(alpha = 0.3) +
  theme_minimal()

In summary

  • Code defensively
  • Start good habits now
  • Learning tools pays dividends